Introducing the Austrian Baroque Corpus: Annotation and Application of a Thematic Research Collection
نویسندگان
چکیده
This paper gives an overview of a relatively new thematic corpus based on German sacred literature of the Baroque period. At present, the digital collection consists of several texts specific to the memento mori genre. All texts in the Austrian Baroque Corpus (ABaC:us) have been enriched with different layers of structural information and tagged using automated tools adapted to the specific needs of the language of the period. One important achievement of the project is that each occurring historic word form has been electronically mapped to its corresponding lemma in High German and corrected or verified by domain experts. In all phases of the workflow, the interdisciplinary team (literary, linguistic, and text technology specialists) insisted on high quality linguistic and semantic annotation, and worked towards creating a sound basis that would allow for more sophisticated research questions. The current version of the interface can be seen as a case example showing how the ABaC:us team provides improved access to these rare pieces of macabre literature that give fascinating evidence of Baroque culture and attitudes towards Life and Death.
منابع مشابه
Porting Elements of the Austrian Baroque Corpus onto the Linguistic Linked Open Data Format
We describe work on porting linguistic and semantic annotation applied to the Austrian Baroque Corpus (ABaC:us) to a format supporting its publication in the Linked Open Data Framework. This work includes several aspects, like a derived lexicon of old forms used in the texts and their mapping to modern German lemmas, the description of morphosyntactic features and the building of domainspecific...
متن کاملThe AAC [Austrian Academy Corpus] - An Enterprise to Develop Large Electronic Text Corpora
The AAC [Austrian Academy Corpus] is a corpus research institution based at the Austrian Academy of Sciences in Vienna. The AAC is a very large and complex electronic text collection. Its aims are to create an innovative text corpus and to conduct scholarly and scientific research in the field of electronic text corpora. In the first phase of the corpus build up the AAC is committed to have at ...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملZT Corpus: Annotation and Tools for Basque Corpora
The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: ...
متن کاملThe development of the spoken corpus of Japanese learner English
1. Introduction To keep up with the information-driven society, it must be one of the most important things to acquire foreign languages, especially English for international communications. In order to develop a computer-assisted language teaching and learning environment, we have been compiling a large-scale speech corpus of Japanese learner English, which provides a lot of useful information...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013